Home

  Research

  Publications

  Software

  People

  Positions available

  Contact us

 

Cis Regulatory Elements

Of fundamental importance for understanding transcriptional regulatory networks is the functional annotation of DNA regulatory motifs (typically ~6-15 bp in length) in terms of what groups of target genes they regulate in a tissue- or temporal-specific manner in response to environmental perturbations. While effective computational methods for mapping DNA regulatory motifs exist in the yeast Saccharomyces cerevisiae, where the DNA binding sites of regulatory transcription factors (TFs) typically occur within ~600 bp upstream of genes, they cannot be applied to metazoan genomes, where genes in the same expression cluster are not necessarily co-regulated by a common mechanism, and the regulatory elements can be far from the transcription start site.

In metazoans, regulatory motifs tend to co-occur within stretches of noncoding sequence, referred to as cis regulatory modules (CRMs), that regulate expression of the nearby gene(s). Numerous approaches have resulted in the successful identification of CRMs, but such approaches do not attempt to predict ab initio the gene expression patterns or functions of the genes regulated by the CRMs. Although algorithms have been developed recently for evaluating the regulatory significance of CRM binding site composition, thus far they have been unable to evaluate the vast sequence regions beyond the proximal promoter that must be considered in mammalian genomes.

We have developed computational algorithms that predict CRMs in the noncoding sequences flanking genes of interest. Our most recent algorithm, PhylCRM, Warner, Philippakis, Jaeger et al., Nature Methods. 2008, 5(4):347-353, combines data for individual motif occurrences scored on an alignment into a single CRM prediction. PhylCRM can scan very long genomic sequences for candidate CRMs by quantifying both motif clustering and conservation across arbitrarily many genomes using an evolutionary model consistent with the phylogeny of the genomes. In our study of cis regulatory elements involved in human myogenic differentiation, we examined 75 kb around the transcription start sites of genes, and utilized the phylogenetic tree containing all 8 sequenced mammalian genomes (human, chimp, macaque, mouse, rat, dog, cow, and opossum). Significantly scoring candidate CRMs of varying lengths, ranging from 20 to 500 bp, are identified and scored.

We have applied these approaches to various systems, both in mammals and Drosophila. Our Drosophila studies are a collaborative effort with Dr. Alan Michelson's lab of the National Heart, Lung, and Blood Institute (NHLBI). The results allowed us to successfully identify novel, functional transcriptional enhancers, in human myogenic differentiation and in Drosophila for various embryonic cell types important in the development of somatic and cardiac mesoderm. This approach is general and can be applied readily to any metazoan genome of interest.

























































These figures were taken from Philippakis, Busser et al.,   PLoS Computational Biology, (2006) 2(5):e53.


This page was last updated September 1, 2008